

<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" />
  
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  
  <title>监控集群 &mdash; Ceph Documentation</title>
  

  
  <link rel="stylesheet" href="../../../_static/ceph.css" type="text/css" />
  <link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
  <link rel="stylesheet" href="../../../_static/graphviz.css" type="text/css" />
  <link rel="stylesheet" href="../../../_static/css/custom.css" type="text/css" />

  
  
    <link rel="shortcut icon" href="../../../_static/favicon.ico"/>
  

  
  

  

  
  <!--[if lt IE 9]>
    <script src="../../../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
    
      <script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
        <script src="../../../_static/jquery.js"></script>
        <script src="../../../_static/underscore.js"></script>
        <script src="../../../_static/doctools.js"></script>
    
    <script type="text/javascript" src="../../../_static/js/theme.js"></script>

    
    <link rel="index" title="Index" href="../../../genindex/" />
    <link rel="search" title="Search" href="../../../search/" />
    <link rel="next" title="监控 OSD 和归置组" href="../monitoring-osd-pg/" />
    <link rel="prev" title="健康检查" href="../health-checks/" /> 
</head>

<body class="wy-body-for-nav">

   
  <header class="top-bar">
    

















<div role="navigation" aria-label="breadcrumbs navigation">

  <ul class="wy-breadcrumbs">
    
      <li><a href="../../../" class="icon icon-home"></a> &raquo;</li>
        
          <li><a href="../../">Ceph 存储集群</a> &raquo;</li>
        
          <li><a href="../">集群运维</a> &raquo;</li>
        
      <li>监控集群</li>
    
    
      <li class="wy-breadcrumbs-aside">
        
          
            <a href="../../../_sources/rados/operations/monitoring.rst.txt" rel="nofollow"> View page source</a>
          
        
      </li>
    
  </ul>

  
  <hr/>
</div>
  </header>
  <div class="wy-grid-for-nav">
    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search"  style="background: #eee" >
          

          
            <a href="../../../">
          

          
            
            <img src="../../../_static/logo.png" class="logo" alt="Logo"/>
          
          </a>

          

          
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../../../search/" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>

          
        </div>

        
        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
          
            
            
              
            
            
              <ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../../start/intro/">Ceph 简介</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../install/">安装 Ceph</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../cephadm/">Cephadm</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../../">Ceph 存储集群</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../../configuration/">配置</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="../">运维</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="../operating/">操纵集群</a></li>
<li class="toctree-l3"><a class="reference internal" href="../health-checks/">健康检查</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">监控集群</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#id2">使用命令行</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id5">检查集群的状态</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id6">观察集群</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id7">监控健康检查信息</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id10">检测配置问题</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id11">检查集群的使用情况</a></li>
<li class="toctree-l4"><a class="reference internal" href="#osd">检查 OSD 状态</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id12">检查监视器状态</a></li>
<li class="toctree-l4"><a class="reference internal" href="#mds">检查 MDS 状态</a></li>
<li class="toctree-l4"><a class="reference internal" href="#id13">检查归置组状态</a></li>
<li class="toctree-l4"><a class="reference internal" href="#rados-monitoring-using-admin-socket">使用管理套接字</a></li>
</ul>
</li>
<li class="toctree-l3"><a class="reference internal" href="../monitoring-osd-pg/">监控 OSD 和归置组</a></li>
<li class="toctree-l3"><a class="reference internal" href="../user-management/">用户管理</a></li>
<li class="toctree-l3"><a class="reference internal" href="../pg-repair/">修复 PG 不一致状态</a></li>
<li class="toctree-l3"><a class="reference internal" href="../data-placement/">数据归置概览</a></li>
<li class="toctree-l3"><a class="reference internal" href="../pools/">存储池</a></li>
<li class="toctree-l3"><a class="reference internal" href="../erasure-code/">纠删码</a></li>
<li class="toctree-l3"><a class="reference internal" href="../cache-tiering/">分级缓存</a></li>
<li class="toctree-l3"><a class="reference internal" href="../placement-groups/">归置组</a></li>
<li class="toctree-l3"><a class="reference internal" href="../balancer/">均衡器</a></li>
<li class="toctree-l3"><a class="reference internal" href="../upmap/">使用 pg-upmap</a></li>
<li class="toctree-l3"><a class="reference internal" href="../crush-map/">CRUSH 图</a></li>
<li class="toctree-l3"><a class="reference internal" href="../crush-map-edits/">手动编辑一个 CRUSH 图</a></li>
<li class="toctree-l3"><a class="reference internal" href="../stretch-mode/">Stretch Clusters</a></li>
<li class="toctree-l3"><a class="reference internal" href="../change-mon-elections/">Configure Monitor Election Strategies</a></li>
<li class="toctree-l3"><a class="reference internal" href="../add-or-rm-osds/">增加/删除 OSD</a></li>
<li class="toctree-l3"><a class="reference internal" href="../add-or-rm-mons/">增加/删除监视器</a></li>
<li class="toctree-l3"><a class="reference internal" href="../devices/">设备管理</a></li>
<li class="toctree-l3"><a class="reference internal" href="../bluestore-migration/">迁移到 BlueStore</a></li>
<li class="toctree-l3"><a class="reference internal" href="../control/">命令参考</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/community/">Ceph 社区</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/troubleshooting-mon/">监视器故障排除</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/troubleshooting-osd/">OSD 故障排除</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/troubleshooting-pg/">归置组排障</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/log-and-debug/">日志记录和调试</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/cpu-profiling/">CPU 剖析</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../troubleshooting/memory-profiling/">内存剖析</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../man/">    手册页</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../troubleshooting/">故障排除</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../api/">APIs</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../../cephfs/">Ceph 文件系统</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../rbd/">Ceph 块设备</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../radosgw/">Ceph 对象网关</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../mgr/">Ceph 管理器守护进程</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../mgr/dashboard/">Ceph 仪表盘</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../api/">API 文档</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../architecture/">体系结构</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../dev/developer_guide/">开发者指南</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../dev/internals/">Ceph 内幕</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../governance/">项目管理</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../foundation/">Ceph 基金会</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../ceph-volume/">ceph-volume</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../releases/general/">Ceph 版本（总目录）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../releases/">Ceph 版本（索引）</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../security/">Security</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../glossary/">Ceph 术语</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../jaegertracing/">Tracing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../../translation_cn/">中文版翻译资源</a></li>
</ul>

            
          
        </div>
        
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" aria-label="top navigation">
        
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../../../">Ceph</a>
        
      </nav>


      <div class="wy-nav-content">
        
        <div class="rst-content">
        
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
            
<div id="dev-warning" class="admonition note">
  <p class="first admonition-title">Notice</p>
  <p class="last">This document is for a development version of Ceph.</p>
</div>
  <div id="docubetter" align="right" style="padding: 5px; font-weight: bold;">
    <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
  </div>

  
  <div class="section" id="id1">
<h1>监控集群<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h1>
<p>集群运行起来后，你可以用 <code class="docutils literal notranslate"><span class="pre">ceph</span></code> 工具来监控，典型的监控包括检查 OSD 状态、监视器状态、归置组状态和元数据服务器状态。</p>
<div class="section" id="id2">
<h2>使用命令行<a class="headerlink" href="#id2" title="Permalink to this headline">¶</a></h2>
<div class="section" id="id3">
<h3>交互模式<a class="headerlink" href="#id3" title="Permalink to this headline">¶</a></h3>
<p>要在交互模式下运行 <code class="docutils literal notranslate"><span class="pre">ceph</span></code> ，不要带参数运行 <code class="docutils literal notranslate"><span class="pre">ceph</span></code> ，
例如：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span>
<span class="n">ceph</span><span class="o">&gt;</span> <span class="n">health</span>
<span class="n">ceph</span><span class="o">&gt;</span> <span class="n">status</span>
<span class="n">ceph</span><span class="o">&gt;</span> <span class="n">quorum_status</span>
<span class="n">ceph</span><span class="o">&gt;</span> <span class="n">mon</span> <span class="n">stat</span>
</pre></div>
</div>
</div>
<div class="section" id="id4">
<h3>非默认的路径<a class="headerlink" href="#id4" title="Permalink to this headline">¶</a></h3>
<p>如果你的配置文件或密钥环不在默认位置内，可以手动指定其位置：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="o">-</span><span class="n">c</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">conf</span> <span class="o">-</span><span class="n">k</span> <span class="o">/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">keyring</span> <span class="n">health</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="id5">
<h2>检查集群的状态<a class="headerlink" href="#id5" title="Permalink to this headline">¶</a></h2>
<p>启动集群后、读写数据前，先检查下集群的健康状态。</p>
<p>可以用下面的命令检查集群状态：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">status</span>
</pre></div>
</div>
<p>或者:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="o">-</span><span class="n">s</span>
</pre></div>
</div>
<p>在交互模式下，输入 <code class="docutils literal notranslate"><span class="pre">status</span></code> 再按回车 <strong>Enter</strong> 。</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span><span class="o">&gt;</span> <span class="n">status</span>
</pre></div>
</div>
<p>Ceph 就会打印出集群状态。例如，一个小型的演示集群，各种服务都有一个，可能会打印如下的：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">cluster</span><span class="p">:</span>
  <span class="nb">id</span><span class="p">:</span>     <span class="mf">477e46</span><span class="n">f1</span><span class="o">-</span><span class="n">ae41</span><span class="o">-</span><span class="mf">4e43</span><span class="o">-</span><span class="mi">9</span><span class="n">c8f</span><span class="o">-</span><span class="mi">72</span><span class="n">c918ab0a20</span>
  <span class="n">health</span><span class="p">:</span> <span class="n">HEALTH_OK</span>

<span class="n">services</span><span class="p">:</span>
  <span class="n">mon</span><span class="p">:</span> <span class="mi">3</span> <span class="n">daemons</span><span class="p">,</span> <span class="n">quorum</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span>
  <span class="n">mgr</span><span class="p">:</span> <span class="n">x</span><span class="p">(</span><span class="n">active</span><span class="p">)</span>
  <span class="n">mds</span><span class="p">:</span> <span class="n">cephfs_a</span><span class="o">-</span><span class="mi">1</span><span class="o">/</span><span class="mi">1</span><span class="o">/</span><span class="mi">1</span> <span class="n">up</span>  <span class="p">{</span><span class="mi">0</span><span class="o">=</span><span class="n">a</span><span class="o">=</span><span class="n">up</span><span class="p">:</span><span class="n">active</span><span class="p">},</span> <span class="mi">2</span> <span class="n">up</span><span class="p">:</span><span class="n">standby</span>
  <span class="n">osd</span><span class="p">:</span> <span class="mi">3</span> <span class="n">osds</span><span class="p">:</span> <span class="mi">3</span> <span class="n">up</span><span class="p">,</span> <span class="mi">3</span> <span class="ow">in</span>

<span class="n">data</span><span class="p">:</span>
  <span class="n">pools</span><span class="p">:</span>   <span class="mi">2</span> <span class="n">pools</span><span class="p">,</span> <span class="mi">16</span> <span class="n">pgs</span>
  <span class="n">objects</span><span class="p">:</span> <span class="mi">21</span> <span class="n">objects</span><span class="p">,</span> <span class="mf">2.19</span><span class="n">K</span>
  <span class="n">usage</span><span class="p">:</span>   <span class="mi">546</span> <span class="n">GB</span> <span class="n">used</span><span class="p">,</span> <span class="mi">384</span> <span class="n">GB</span> <span class="o">/</span> <span class="mi">931</span> <span class="n">GB</span> <span class="n">avail</span>
  <span class="n">pgs</span><span class="p">:</span>     <span class="mi">16</span> <span class="n">active</span><span class="o">+</span><span class="n">clean</span>
</pre></div>
</div>
<div class="topic">
<p class="topic-title">Ceph 如何计算数据量</p>
<p><code class="docutils literal notranslate"><span class="pre">usage</span></code> 值反映了<em>事实上</em>已占用的原始存储空间。
<code class="docutils literal notranslate"><span class="pre">xxx</span> <span class="pre">GB</span> <span class="pre">/</span> <span class="pre">xxx</span> <span class="pre">GB</span></code> 值则是剩余空间（较小的数）与集群总容量的比较。理论数值反映了所存储数据的原始尺寸，未计算其副本、克隆、或快照空间，所以数据存储实际占用的空间通常会超过理论数值，因为 Ceph 会自动创建数据副本，另外存储空间也可能用于克隆和快照。</p>
</div>
</div>
<div class="section" id="id6">
<h2>观察集群<a class="headerlink" href="#id6" title="Permalink to this headline">¶</a></h2>
<p>除了各守护进程的本地日志， Ceph 集群还维护着一个 <em>集群日志</em>，
它记录着事关整个系统的高级事件。此类日志记录在监视器服务器的磁盘上（默认为 <code class="docutils literal notranslate"><span class="pre">/var/log/ceph/ceph.log</span></code> ），也可以通过命令行监控。</p>
<p>要持续关注集群日志，用下列命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="o">-</span><span class="n">w</span>
</pre></div>
</div>
<p>Ceph 会打印系统的状态，然后是正发生着的各日子消息。例如：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">cluster</span><span class="p">:</span>
  <span class="nb">id</span><span class="p">:</span>     <span class="mf">477e46</span><span class="n">f1</span><span class="o">-</span><span class="n">ae41</span><span class="o">-</span><span class="mf">4e43</span><span class="o">-</span><span class="mi">9</span><span class="n">c8f</span><span class="o">-</span><span class="mi">72</span><span class="n">c918ab0a20</span>
  <span class="n">health</span><span class="p">:</span> <span class="n">HEALTH_OK</span>

<span class="n">services</span><span class="p">:</span>
  <span class="n">mon</span><span class="p">:</span> <span class="mi">3</span> <span class="n">daemons</span><span class="p">,</span> <span class="n">quorum</span> <span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">,</span><span class="n">c</span>
  <span class="n">mgr</span><span class="p">:</span> <span class="n">x</span><span class="p">(</span><span class="n">active</span><span class="p">)</span>
  <span class="n">mds</span><span class="p">:</span> <span class="n">cephfs_a</span><span class="o">-</span><span class="mi">1</span><span class="o">/</span><span class="mi">1</span><span class="o">/</span><span class="mi">1</span> <span class="n">up</span>  <span class="p">{</span><span class="mi">0</span><span class="o">=</span><span class="n">a</span><span class="o">=</span><span class="n">up</span><span class="p">:</span><span class="n">active</span><span class="p">},</span> <span class="mi">2</span> <span class="n">up</span><span class="p">:</span><span class="n">standby</span>
  <span class="n">osd</span><span class="p">:</span> <span class="mi">3</span> <span class="n">osds</span><span class="p">:</span> <span class="mi">3</span> <span class="n">up</span><span class="p">,</span> <span class="mi">3</span> <span class="ow">in</span>

<span class="n">data</span><span class="p">:</span>
  <span class="n">pools</span><span class="p">:</span>   <span class="mi">2</span> <span class="n">pools</span><span class="p">,</span> <span class="mi">16</span> <span class="n">pgs</span>
  <span class="n">objects</span><span class="p">:</span> <span class="mi">21</span> <span class="n">objects</span><span class="p">,</span> <span class="mf">2.19</span><span class="n">K</span>
  <span class="n">usage</span><span class="p">:</span>   <span class="mi">546</span> <span class="n">GB</span> <span class="n">used</span><span class="p">,</span> <span class="mi">384</span> <span class="n">GB</span> <span class="o">/</span> <span class="mi">931</span> <span class="n">GB</span> <span class="n">avail</span>
  <span class="n">pgs</span><span class="p">:</span>     <span class="mi">16</span> <span class="n">active</span><span class="o">+</span><span class="n">clean</span>


<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">24</span> <span class="mi">08</span><span class="p">:</span><span class="mi">15</span><span class="p">:</span><span class="mf">11.329298</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">23</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">INF</span><span class="p">]</span> <span class="n">osd</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6806</span><span class="o">/</span><span class="mi">20527</span> <span class="n">boot</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">24</span> <span class="mi">08</span><span class="p">:</span><span class="mi">15</span><span class="p">:</span><span class="mf">14.258143</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">39</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">INF</span><span class="p">]</span> <span class="n">Activating</span> <span class="n">manager</span> <span class="n">daemon</span> <span class="n">x</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">24</span> <span class="mi">08</span><span class="p">:</span><span class="mi">15</span><span class="p">:</span><span class="mf">15.446025</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">47</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">INF</span><span class="p">]</span> <span class="n">Manager</span> <span class="n">daemon</span> <span class="n">x</span> <span class="ow">is</span> <span class="n">now</span> <span class="n">available</span>
</pre></div>
</div>
<p>除了用 <code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">-w</span></code> 打印它们发出的日志行，还可以用
<code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">log</span> <span class="pre">last</span> <span class="pre">[n]</span></code> 查看最近的 <code class="docutils literal notranslate"><span class="pre">n</span></code> 行集群日志。</p>
</div>
<div class="section" id="id7">
<h2>监控健康检查信息<a class="headerlink" href="#id7" title="Permalink to this headline">¶</a></h2>
<p>Ceph 不间断地对自身状态做<em>健康检查</em>。查到问题时，会在
<code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">status</span></code> （或 <code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">health</span></code> ）的输出中反映出来。另外，检查失败时、或集群恢复时，相关消息也会发往集群日志。</p>
<p>例如，一个 OSD 挂掉时，状态输出的 <code class="docutils literal notranslate"><span class="pre">health</span></code> 那段可能会更新为如下：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">health</span><span class="p">:</span> <span class="n">HEALTH_WARN</span>
        <span class="mi">1</span> <span class="n">osds</span> <span class="n">down</span>
        <span class="n">Degraded</span> <span class="n">data</span> <span class="n">redundancy</span><span class="p">:</span> <span class="mi">21</span><span class="o">/</span><span class="mi">63</span> <span class="n">objects</span> <span class="n">degraded</span> <span class="p">(</span><span class="mf">33.333</span><span class="o">%</span><span class="p">),</span> <span class="mi">16</span> <span class="n">pgs</span> <span class="n">unclean</span><span class="p">,</span> <span class="mi">16</span> <span class="n">pgs</span> <span class="n">degraded</span>
</pre></div>
</div>
<p>此时，也发送了集群日志消息，以记录此次健康检查失败事件：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">25</span> <span class="mi">10</span><span class="p">:</span><span class="mi">08</span><span class="p">:</span><span class="mf">58.265945</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">91</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">WRN</span><span class="p">]</span> <span class="n">Health</span> <span class="n">check</span> <span class="n">failed</span><span class="p">:</span> <span class="mi">1</span> <span class="n">osds</span> <span class="n">down</span> <span class="p">(</span><span class="n">OSD_DOWN</span><span class="p">)</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">25</span> <span class="mi">10</span><span class="p">:</span><span class="mi">09</span><span class="p">:</span><span class="mf">01.302624</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">94</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">WRN</span><span class="p">]</span> <span class="n">Health</span> <span class="n">check</span> <span class="n">failed</span><span class="p">:</span> <span class="n">Degraded</span> <span class="n">data</span> <span class="n">redundancy</span><span class="p">:</span> <span class="mi">21</span><span class="o">/</span><span class="mi">63</span> <span class="n">objects</span> <span class="n">degraded</span> <span class="p">(</span><span class="mf">33.333</span><span class="o">%</span><span class="p">),</span> <span class="mi">16</span> <span class="n">pgs</span> <span class="n">unclean</span><span class="p">,</span> <span class="mi">16</span> <span class="n">pgs</span> <span class="n">degraded</span> <span class="p">(</span><span class="n">PG_DEGRADED</span><span class="p">)</span>
</pre></div>
</div>
<p>当这个 OSD 恢复在线时，集群日志也会记录集群已回归健康状态：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">25</span> <span class="mi">10</span><span class="p">:</span><span class="mi">11</span><span class="p">:</span><span class="mf">11.526841</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">109</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">WRN</span><span class="p">]</span> <span class="n">Health</span> <span class="n">check</span> <span class="n">update</span><span class="p">:</span> <span class="n">Degraded</span> <span class="n">data</span> <span class="n">redundancy</span><span class="p">:</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">unclean</span><span class="p">,</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">degraded</span><span class="p">,</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">undersized</span> <span class="p">(</span><span class="n">PG_DEGRADED</span><span class="p">)</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">25</span> <span class="mi">10</span><span class="p">:</span><span class="mi">11</span><span class="p">:</span><span class="mf">13.535493</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">110</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">INF</span><span class="p">]</span> <span class="n">Health</span> <span class="n">check</span> <span class="n">cleared</span><span class="p">:</span> <span class="n">PG_DEGRADED</span> <span class="p">(</span><span class="n">was</span><span class="p">:</span> <span class="n">Degraded</span> <span class="n">data</span> <span class="n">redundancy</span><span class="p">:</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">unclean</span><span class="p">,</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">degraded</span><span class="p">,</span> <span class="mi">2</span> <span class="n">pgs</span> <span class="n">undersized</span><span class="p">)</span>
<span class="mi">2017</span><span class="o">-</span><span class="mi">07</span><span class="o">-</span><span class="mi">25</span> <span class="mi">10</span><span class="p">:</span><span class="mi">11</span><span class="p">:</span><span class="mf">13.535577</span> <span class="n">mon</span><span class="o">.</span><span class="n">a</span> <span class="n">mon</span><span class="mf">.0</span> <span class="mf">172.21.9.34</span><span class="p">:</span><span class="mi">6789</span><span class="o">/</span><span class="mi">0</span> <span class="mi">111</span> <span class="p">:</span> <span class="n">cluster</span> <span class="p">[</span><span class="n">INF</span><span class="p">]</span> <span class="n">Cluster</span> <span class="ow">is</span> <span class="n">now</span> <span class="n">healthy</span>
</pre></div>
</div>
<div class="section" id="id8">
<h3>网络性能检查<a class="headerlink" href="#id8" title="Permalink to this headline">¶</a></h3>
<p>Ceph OSDs send heartbeat ping messages amongst themselves to monitor daemon availability.  We
also use the response times to monitor network performance.
While it is possible that a busy OSD could delay a ping response, we can assume
that if a network switch fails multiple delays will be detected between distinct pairs of OSDs.</p>
<p>By default we will warn about ping times which exceed 1 second (1000 milliseconds).</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">HEALTH_WARN</span> <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="p">(</span><span class="n">longest</span> <span class="mf">1118.001</span><span class="n">ms</span><span class="p">)</span>
</pre></div>
</div>
<p>The health detail will add the combination of OSDs are seeing the delays and by how much.  There is a limit of 10
detail line items.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="n">WRN</span><span class="p">]</span> <span class="n">OSD_SLOW_PING_TIME_BACK</span><span class="p">:</span> <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="p">(</span><span class="n">longest</span> <span class="mf">1118.001</span><span class="n">ms</span><span class="p">)</span>
    <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="kn">from</span> <span class="nn">osd.</span><span class="mi">0</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="n">to</span> <span class="n">osd</span><span class="mf">.1</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="mf">1118.001</span> <span class="n">msec</span> <span class="n">possibly</span> <span class="n">improving</span>
    <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="kn">from</span> <span class="nn">osd.</span><span class="mi">0</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="n">to</span> <span class="n">osd</span><span class="mf">.2</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack2</span><span class="p">]</span> <span class="mf">1030.123</span> <span class="n">msec</span>
    <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="kn">from</span> <span class="nn">osd.</span><span class="mi">2</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack2</span><span class="p">]</span> <span class="n">to</span> <span class="n">osd</span><span class="mf">.1</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="mf">1015.321</span> <span class="n">msec</span>
    <span class="n">Slow</span> <span class="n">OSD</span> <span class="n">heartbeats</span> <span class="n">on</span> <span class="n">back</span> <span class="kn">from</span> <span class="nn">osd.</span><span class="mi">1</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="n">to</span> <span class="n">osd</span><span class="mf">.0</span> <span class="p">[</span><span class="n">dc1</span><span class="p">,</span><span class="n">rack1</span><span class="p">]</span> <span class="mf">1010.456</span> <span class="n">msec</span>
</pre></div>
</div>
<p>To see even more detail and a complete dump of network performance information the <code class="docutils literal notranslate"><span class="pre">dump_osd_network</span></code> command can be used.  Typically, this would be
sent to a mgr, but it can be limited to a particular OSD’s interactions by issuing it to any OSD.  The current threshold which defaults to 1 second
(1000 milliseconds) can be overridden as an argument in milliseconds.</p>
<p>The following command will show all gathered network performance data by specifying a threshold of 0 and sending to the mgr.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ceph daemon /var/run/ceph/ceph-mgr.x.asok dump_osd_network 0
{
    &quot;threshold&quot;: 0,
    &quot;entries&quot;: [
        {
            &quot;last update&quot;: &quot;Wed Sep  4 17:04:49 2019&quot;,
            &quot;stale&quot;: false,
            &quot;from osd&quot;: 2,
            &quot;to osd&quot;: 0,
            &quot;interface&quot;: &quot;front&quot;,
            &quot;average&quot;: {
                &quot;1min&quot;: 1.023,
                &quot;5min&quot;: 0.860,
                &quot;15min&quot;: 0.883
            },
            &quot;min&quot;: {
                &quot;1min&quot;: 0.818,
                &quot;5min&quot;: 0.607,
                &quot;15min&quot;: 0.607
            },
            &quot;max&quot;: {
                &quot;1min&quot;: 1.164,
                &quot;5min&quot;: 1.173,
                &quot;15min&quot;: 1.544
            },
            &quot;last&quot;: 0.924
        },
        {
            &quot;last update&quot;: &quot;Wed Sep  4 17:04:49 2019&quot;,
            &quot;stale&quot;: false,
            &quot;from osd&quot;: 2,
            &quot;to osd&quot;: 0,
            &quot;interface&quot;: &quot;back&quot;,
            &quot;average&quot;: {
                &quot;1min&quot;: 0.968,
                &quot;5min&quot;: 0.897,
                &quot;15min&quot;: 0.830
            },
            &quot;min&quot;: {
                &quot;1min&quot;: 0.860,
                &quot;5min&quot;: 0.563,
                &quot;15min&quot;: 0.502
            },
            &quot;max&quot;: {
                &quot;1min&quot;: 1.171,
                &quot;5min&quot;: 1.216,
                &quot;15min&quot;: 1.456
            },
            &quot;last&quot;: 0.845
        },
        {
            &quot;last update&quot;: &quot;Wed Sep  4 17:04:48 2019&quot;,
            &quot;stale&quot;: false,
            &quot;from osd&quot;: 0,
            &quot;to osd&quot;: 1,
            &quot;interface&quot;: &quot;front&quot;,
            &quot;average&quot;: {
                &quot;1min&quot;: 0.965,
                &quot;5min&quot;: 0.811,
                &quot;15min&quot;: 0.850
            },
            &quot;min&quot;: {
                &quot;1min&quot;: 0.650,
                &quot;5min&quot;: 0.488,
                &quot;15min&quot;: 0.466
            },
            &quot;max&quot;: {
                &quot;1min&quot;: 1.252,
                &quot;5min&quot;: 1.252,
                &quot;15min&quot;: 1.362
            },
        &quot;last&quot;: 0.791
    },
    ...
</pre></div>
</div>
</div>
<div class="section" id="id9">
<h3>屏蔽健康检查<a class="headerlink" href="#id9" title="Permalink to this headline">¶</a></h3>
<p>Health checks can be muted so that they do not affect the overall
reported status of the cluster.  Alerts are specified using the health
check code (see <a class="reference internal" href="../health-checks/#health-checks"><span class="std std-ref">健康检查</span></a>):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">mute</span> <span class="o">&lt;</span><span class="n">code</span><span class="o">&gt;</span>
</pre></div>
</div>
<p>For example, if there is a health warning, muting it will make the
cluster report an overall status of <code class="docutils literal notranslate"><span class="pre">HEALTH_OK</span></code>.  For example, to
mute an <code class="docutils literal notranslate"><span class="pre">OSD_DOWN</span></code> alert,:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">mute</span> <span class="n">OSD_DOWN</span>
</pre></div>
</div>
<p>Mutes are reported as part of the short and long form of the <code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">health</span></code> command.
For example, in the above scenario, the cluster would report:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>$ ceph health
HEALTH_OK (muted: OSD_DOWN)
$ ceph health detail
HEALTH_OK (muted: OSD_DOWN)
(MUTED) OSD_DOWN 1 osds down
    osd.1 is down
</pre></div>
</div>
<p>A mute can be explicitly removed with:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">unmute</span> <span class="o">&lt;</span><span class="n">code</span><span class="o">&gt;</span>
</pre></div>
</div>
<p>For example,:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">unmute</span> <span class="n">OSD_DOWN</span>
</pre></div>
</div>
<p>A health check mute may optionally have a TTL (time to live)
associated with it, such that the mute will automatically expire
after the specified period of time has elapsed.  The TTL is specified as an optional
duration argument, e.g.:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">mute</span> <span class="n">OSD_DOWN</span> <span class="mi">4</span><span class="n">h</span>    <span class="c1"># mute for 4 hours</span>
<span class="n">ceph</span> <span class="n">health</span> <span class="n">mute</span> <span class="n">MON_DOWN</span> <span class="mi">15</span><span class="n">m</span>   <span class="c1"># mute for 15  minutes</span>
</pre></div>
</div>
<p>Normally, if a muted health alert is resolved (e.g., in the example
above, the OSD comes back up), the mute goes away.  If the alert comes
back later, it will be reported in the usual way.</p>
<p>It is possible to make a mute “sticky” such that the mute will remain even if the
alert clears.  For example,:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">health</span> <span class="n">mute</span> <span class="n">OSD_DOWN</span> <span class="mi">1</span><span class="n">h</span> <span class="o">--</span><span class="n">sticky</span>   <span class="c1"># ignore any/all down OSDs for next hour</span>
</pre></div>
</div>
<p>Most health mutes also disappear if the extent of an alert gets worse.  For example,
if there is one OSD down, and the alert is muted, the mute will disappear if one
or more additional OSDs go down.  This is true for any health alert that involves
a count indicating how much or how many of something is triggering the warning or
error.</p>
</div>
</div>
<div class="section" id="id10">
<h2>检测配置问题<a class="headerlink" href="#id10" title="Permalink to this headline">¶</a></h2>
<p>除了 Ceph 持续运行时进行的自我健康检查，
还有一些配置问题只能用外部工具探测。</p>
<p>可以用 <a class="reference external" href="http://docs.ceph.com/ceph-medic/master/">ceph-medic</a> 工具另行检查你的
Ceph 集群配置。</p>
</div>
<div class="section" id="id11">
<h2>检查集群的使用情况<a class="headerlink" href="#id11" title="Permalink to this headline">¶</a></h2>
<p>要检查集群的数据用量及其在存储池内的分布情况，
可以用 <code class="docutils literal notranslate"><span class="pre">df</span></code> 选项，它和 Linux 上的 <code class="docutils literal notranslate"><span class="pre">df</span></code> 相似。
如下：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">df</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">df</span></code> 的输出像这样：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CLASS</span>     <span class="n">SIZE</span>    <span class="n">AVAIL</span>     <span class="n">USED</span>  <span class="n">RAW</span> <span class="n">USED</span>  <span class="o">%</span><span class="n">RAW</span> <span class="n">USED</span>
<span class="n">ssd</span>    <span class="mi">202</span> <span class="n">GiB</span>  <span class="mi">200</span> <span class="n">GiB</span>  <span class="mf">2.0</span> <span class="n">GiB</span>   <span class="mf">2.0</span> <span class="n">GiB</span>       <span class="mf">1.00</span>
<span class="n">TOTAL</span>  <span class="mi">202</span> <span class="n">GiB</span>  <span class="mi">200</span> <span class="n">GiB</span>  <span class="mf">2.0</span> <span class="n">GiB</span>   <span class="mf">2.0</span> <span class="n">GiB</span>       <span class="mf">1.00</span>

<span class="o">---</span> <span class="n">POOLS</span> <span class="o">---</span>
<span class="n">POOL</span>                   <span class="n">ID</span>  <span class="n">PGS</span>   <span class="n">STORED</span>   <span class="p">(</span><span class="n">DATA</span><span class="p">)</span>   <span class="p">(</span><span class="n">OMAP</span><span class="p">)</span>   <span class="n">OBJECTS</span>     <span class="n">USED</span>  <span class="p">(</span><span class="n">DATA</span><span class="p">)</span>   <span class="p">(</span><span class="n">OMAP</span><span class="p">)</span>   <span class="o">%</span><span class="n">USED</span>  <span class="n">MAX</span> <span class="n">AVAIL</span>  <span class="n">QUOTA</span> <span class="n">OBJECTS</span>  <span class="n">QUOTA</span> <span class="n">BYTES</span>  <span class="n">DIRTY</span>  <span class="n">USED</span> <span class="n">COMPR</span>  <span class="n">UNDER</span> <span class="n">COMPR</span>
<span class="n">device_health_metrics</span>   <span class="mi">1</span>    <span class="mi">1</span>  <span class="mi">242</span> <span class="n">KiB</span>   <span class="mi">15</span> <span class="n">KiB</span>  <span class="mi">227</span> <span class="n">KiB</span>         <span class="mi">4</span>  <span class="mi">251</span> <span class="n">KiB</span>  <span class="mi">24</span> <span class="n">KiB</span>  <span class="mi">227</span> <span class="n">KiB</span>       <span class="mi">0</span>    <span class="mi">297</span> <span class="n">GiB</span>            <span class="n">N</span><span class="o">/</span><span class="n">A</span>          <span class="n">N</span><span class="o">/</span><span class="n">A</span>      <span class="mi">4</span>         <span class="mi">0</span> <span class="n">B</span>          <span class="mi">0</span> <span class="n">B</span>
<span class="n">cephfs</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">meta</span>           <span class="mi">2</span>   <span class="mi">32</span>  <span class="mf">6.8</span> <span class="n">KiB</span>  <span class="mf">6.8</span> <span class="n">KiB</span>      <span class="mi">0</span> <span class="n">B</span>        <span class="mi">22</span>   <span class="mi">96</span> <span class="n">KiB</span>  <span class="mi">96</span> <span class="n">KiB</span>      <span class="mi">0</span> <span class="n">B</span>       <span class="mi">0</span>    <span class="mi">297</span> <span class="n">GiB</span>            <span class="n">N</span><span class="o">/</span><span class="n">A</span>          <span class="n">N</span><span class="o">/</span><span class="n">A</span>     <span class="mi">22</span>         <span class="mi">0</span> <span class="n">B</span>          <span class="mi">0</span> <span class="n">B</span>
<span class="n">cephfs</span><span class="o">.</span><span class="n">a</span><span class="o">.</span><span class="n">data</span>           <span class="mi">3</span>   <span class="mi">32</span>      <span class="mi">0</span> <span class="n">B</span>      <span class="mi">0</span> <span class="n">B</span>      <span class="mi">0</span> <span class="n">B</span>         <span class="mi">0</span>      <span class="mi">0</span> <span class="n">B</span>     <span class="mi">0</span> <span class="n">B</span>      <span class="mi">0</span> <span class="n">B</span>       <span class="mi">0</span>     <span class="mi">99</span> <span class="n">GiB</span>            <span class="n">N</span><span class="o">/</span><span class="n">A</span>          <span class="n">N</span><span class="o">/</span><span class="n">A</span>      <span class="mi">0</span>         <span class="mi">0</span> <span class="n">B</span>          <span class="mi">0</span> <span class="n">B</span>
<span class="n">test</span>                    <span class="mi">4</span>   <span class="mi">32</span>   <span class="mi">22</span> <span class="n">MiB</span>   <span class="mi">22</span> <span class="n">MiB</span>   <span class="mi">50</span> <span class="n">KiB</span>       <span class="mi">248</span>   <span class="mi">19</span> <span class="n">MiB</span>  <span class="mi">19</span> <span class="n">MiB</span>   <span class="mi">50</span> <span class="n">KiB</span>       <span class="mi">0</span>    <span class="mi">297</span> <span class="n">GiB</span>            <span class="n">N</span><span class="o">/</span><span class="n">A</span>          <span class="n">N</span><span class="o">/</span><span class="n">A</span>    <span class="mi">248</span>         <span class="mi">0</span> <span class="n">B</span>          <span class="mi">0</span> <span class="n">B</span>
</pre></div>
</div>
<p>输出中的 <strong>RAW STORAGE</strong> 段概述了你的集群管理着的存储空间。</p>
<ul class="simple">
<li><p><strong>CLASS:</strong> The class of OSD device (or the total for the cluster)</p></li>
<li><p><strong>SIZE:</strong> 集群管理着的存储容量；</p></li>
<li><p><strong>AVAIL:</strong> 集群的空闲空间总量；</p></li>
<li><p><strong>USED:</strong> 用户数据消耗的原始存储空间；</p></li>
<li><p><strong>RAW USED:</strong> 已使用的原始存储空间总量，包括用户数据、内部开销、或保留容量；</p></li>
<li><p><strong>% RAW USED:</strong> 已用原始存储空间比率。
用此值参照 <code class="docutils literal notranslate"><span class="pre">full</span> <span class="pre">ratio</span></code> 和 <code class="docutils literal notranslate"><span class="pre">near</span> <span class="pre">full</span> <span class="pre">ratio</span></code>
来确保不会用尽集群空间。
详情见<a class="reference external" href="../../configuration/mon-config-ref#storage-capacity">存储容量</a>。</p></li>
</ul>
<p><strong>POOLS:</strong></p>
<p>输出的 <strong>POOLS</strong> 段展示了存储池列表及各存储池的大致使用率。本段<strong>没有</strong>展示副本、克隆品和快照占用情况。例如，如果你把
1MB 的数据存储为对象，理论使用率将是 1MB ，但考虑到副本数、克隆数、和快照数，实际使用率可能是 2MB 或更多。</p>
<ul class="simple">
<li><p><strong>ID:</strong> The number of the node within the pool.</p></li>
<li><p><strong>STORED:</strong> actual amount of data user/Ceph has stored in a pool. This is
similar to the USED column in earlier versions of Ceph but the calculations
(for BlueStore!) are more precise (gaps are properly handled).</p>
<ul>
<li><p><strong>(DATA):</strong> usage for RBD (RADOS Block Device), CephFS file data, and RGW
(RADOS Gateway) object data.</p></li>
<li><p><strong>(OMAP):</strong> key-value pairs. Used primarily by CephFS and RGW (RADOS
Gateway) for metadata storage.</p></li>
</ul>
</li>
<li><p><strong>OBJECTS:</strong> The notional number of objects stored per pool. “Notional” is
defined above in the paragraph immediately under “POOLS”.</p></li>
<li><p><strong>USED:</strong> The space allocated for a pool over all OSDs. This includes
replication, allocation granularity, and erasure-coding overhead. Compression
savings and object content gaps are also taken into account. BlueStore’s
database is not included in this amount.</p>
<ul>
<li><p><strong>(DATA):</strong> object usage for RBD (RADOS Block Device), CephFS file data, and RGW
(RADOS Gateway) object data.</p></li>
<li><p><strong>(OMAP):</strong> object key-value pairs. Used primarily by CephFS and RGW (RADOS
Gateway) for metadata storage.</p></li>
</ul>
</li>
<li><p><strong>%USED:</strong> The notional percentage of storage used per pool.</p></li>
<li><p><strong>MAX AVAIL:</strong> An estimate of the notional amount of data that can be written
to this pool.</p></li>
<li><p><strong>QUOTA OBJECTS:</strong> The number of quota objects.</p></li>
<li><p><strong>QUOTA BYTES:</strong> The number of bytes in the quota objects.</p></li>
<li><p><strong>DIRTY:</strong> The number of objects in the cache pool that have been written to
the cache pool but have not been flushed yet to the base pool. This field is
only available when cache tiering is in use.</p></li>
<li><p><strong>USED COMPR:</strong> amount of space allocated for compressed data (i.e. this
includes comrpessed data plus all the allocation, replication and erasure
coding overhead).</p></li>
<li><p><strong>UNDER COMPR:</strong> amount of data passed through compression (summed over all
replicas) and beneficial enough to be stored in a compressed form.</p></li>
</ul>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p><strong>POOLS</strong> 段内的数字是理论值，它们不包含副本、快照或克隆。因此，它与 <strong>USED</strong> 和 <strong>%USED</strong> 数量之和不会达到
<strong>GLOBAL</strong> 段中的 <strong>RAW USED</strong> 和 <strong>%RAW USED</strong> 数量。</p>
</div>
<div class="admonition note">
<p class="admonition-title">Note</p>
<p><strong>MAX AVAIL</strong> 数值计算很复杂，涉及到存储池是副本的还是纠删码的、映射存储与设备的 CRUSH 规则、那些设备的利用率、还有配置的 mon_osd_full_ratio 。</p>
</div>
</div>
<div class="section" id="osd">
<h2>检查 OSD 状态<a class="headerlink" href="#osd" title="Permalink to this headline">¶</a></h2>
<p>你可以执行下列命令来确定 OSD 状态为 <code class="docutils literal notranslate"><span class="pre">up</span></code> 且 <code class="docutils literal notranslate"><span class="pre">in</span></code> ：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><style type="text/css">
span.prompt1:before {
  content: "# ";
}
</style><span class="prompt1">ceph osd stat</span>
</pre></div></div><p>或者：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span class="prompt1">ceph osd dump</span>
</pre></div></div><p>你也可以根据 OSD 在 CRUSH 图里的位置来查看：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span class="prompt1">ceph osd tree</span>
</pre></div></div><p>Ceph 会打印 CRUSH 的树状态、它的 OSD 例程、状态、权重：</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1">#ID CLASS WEIGHT  TYPE NAME             STATUS REWEIGHT PRI-AFF</span>
 -1       <span class="m">3</span>.00000 pool default
 -3       <span class="m">3</span>.00000 rack mainrack
 -2       <span class="m">3</span>.00000 host osd-host
  <span class="m">0</span>   ssd <span class="m">1</span>.00000         osd.0             up  <span class="m">1</span>.00000 <span class="m">1</span>.00000
  <span class="m">1</span>   ssd <span class="m">1</span>.00000         osd.1             up  <span class="m">1</span>.00000 <span class="m">1</span>.00000
  <span class="m">2</span>   ssd <span class="m">1</span>.00000         osd.2             up  <span class="m">1</span>.00000 <span class="m">1</span>.00000
</pre></div>
</div>
<p>个中详情见<a class="reference external" href="../monitoring-osd-pg">监控 OSD 和归置组</a>。</p>
</div>
<div class="section" id="id12">
<h2>检查监视器状态<a class="headerlink" href="#id12" title="Permalink to this headline">¶</a></h2>
<p>如果你有多个监视器（很可能），你启动集群后、读写数据前应该检查监视器法定人数状态。运行着多个监视器时必须形成法定人数，最好周期性地检查监视器状态来确定它们在运行。</p>
<p>要查看监视器图，执行下面的命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">mon</span> <span class="n">stat</span>
</pre></div>
</div>
<p>或者：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">mon</span> <span class="n">dump</span>
</pre></div>
</div>
<p>要检查监视器的法定人数状态，执行下面的命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">quorum_status</span>
</pre></div>
</div>
<p>Ceph 会返回法定人数状态，例如，包含 3 个监视器的 Ceph 集群可能返回下面的：</p>
<div class="highlight-javascript notranslate"><div class="highlight"><pre><span></span><span class="p">{</span> <span class="s2">&quot;election_epoch&quot;</span><span class="o">:</span> <span class="mf">10</span><span class="p">,</span>
  <span class="s2">&quot;quorum&quot;</span><span class="o">:</span> <span class="p">[</span>
        <span class="mf">0</span><span class="p">,</span>
        <span class="mf">1</span><span class="p">,</span>
        <span class="mf">2</span><span class="p">],</span>
  <span class="s2">&quot;quorum_names&quot;</span><span class="o">:</span> <span class="p">[</span>
        <span class="s2">&quot;a&quot;</span><span class="p">,</span>
        <span class="s2">&quot;b&quot;</span><span class="p">,</span>
        <span class="s2">&quot;c&quot;</span><span class="p">],</span>
  <span class="s2">&quot;quorum_leader_name&quot;</span><span class="o">:</span> <span class="s2">&quot;a&quot;</span><span class="p">,</span>
  <span class="s2">&quot;monmap&quot;</span><span class="o">:</span> <span class="p">{</span> <span class="s2">&quot;epoch&quot;</span><span class="o">:</span> <span class="mf">1</span><span class="p">,</span>
      <span class="s2">&quot;fsid&quot;</span><span class="o">:</span> <span class="s2">&quot;444b489c-4f16-4b75-83f0-cb8097468898&quot;</span><span class="p">,</span>
      <span class="s2">&quot;modified&quot;</span><span class="o">:</span> <span class="s2">&quot;2011-12-12 13:28:27.505520&quot;</span><span class="p">,</span>
      <span class="s2">&quot;created&quot;</span><span class="o">:</span> <span class="s2">&quot;2011-12-12 13:28:27.505520&quot;</span><span class="p">,</span>
      <span class="s2">&quot;features&quot;</span><span class="o">:</span> <span class="p">{</span><span class="s2">&quot;persistent&quot;</span><span class="o">:</span> <span class="p">[</span>
                        <span class="s2">&quot;kraken&quot;</span><span class="p">,</span>
                        <span class="s2">&quot;luminous&quot;</span><span class="p">,</span>
                        <span class="s2">&quot;mimic&quot;</span><span class="p">],</span>
        <span class="s2">&quot;optional&quot;</span><span class="o">:</span> <span class="p">[]</span>
      <span class="p">},</span>
      <span class="s2">&quot;mons&quot;</span><span class="o">:</span> <span class="p">[</span>
            <span class="p">{</span> <span class="s2">&quot;rank&quot;</span><span class="o">:</span> <span class="mf">0</span><span class="p">,</span>
              <span class="s2">&quot;name&quot;</span><span class="o">:</span> <span class="s2">&quot;a&quot;</span><span class="p">,</span>
              <span class="s2">&quot;addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6789/0&quot;</span><span class="p">,</span>
              <span class="s2">&quot;public_addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6789/0&quot;</span><span class="p">},</span>
            <span class="p">{</span> <span class="s2">&quot;rank&quot;</span><span class="o">:</span> <span class="mf">1</span><span class="p">,</span>
              <span class="s2">&quot;name&quot;</span><span class="o">:</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span>
              <span class="s2">&quot;addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6790/0&quot;</span><span class="p">,</span>
              <span class="s2">&quot;public_addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6790/0&quot;</span><span class="p">},</span>
            <span class="p">{</span> <span class="s2">&quot;rank&quot;</span><span class="o">:</span> <span class="mf">2</span><span class="p">,</span>
              <span class="s2">&quot;name&quot;</span><span class="o">:</span> <span class="s2">&quot;c&quot;</span><span class="p">,</span>
              <span class="s2">&quot;addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6791/0&quot;</span><span class="p">,</span>
              <span class="s2">&quot;public_addr&quot;</span><span class="o">:</span> <span class="s2">&quot;127.0.0.1:6791/0&quot;</span><span class="p">}</span>
           <span class="p">]</span>
  <span class="p">}</span>
<span class="p">}</span>
</pre></div>
</div>
</div>
<div class="section" id="mds">
<h2>检查 MDS 状态<a class="headerlink" href="#mds" title="Permalink to this headline">¶</a></h2>
<p>元数据服务器为 Ceph 文件系统提供元数据服务，元数据服务器有两种状态： <code class="docutils literal notranslate"><span class="pre">up</span> <span class="pre">|</span> <span class="pre">down</span></code> 和 <code class="docutils literal notranslate"><span class="pre">active</span> <span class="pre">|</span> <span class="pre">inactive</span></code> ，执行下面的命令查看元数据服务器状态为 <code class="docutils literal notranslate"><span class="pre">up</span></code> 且 <code class="docutils literal notranslate"><span class="pre">active</span></code> ：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">mds</span> <span class="n">stat</span>
</pre></div>
</div>
<p>要展示元数据集群的详细状态，执行下面的命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">fs</span> <span class="n">dump</span>
</pre></div>
</div>
</div>
<div class="section" id="id13">
<h2>检查归置组状态<a class="headerlink" href="#id13" title="Permalink to this headline">¶</a></h2>
<p>归置组把对象映射到 OSD 。监控归置组时，我们希望它们的状态是
<code class="docutils literal notranslate"><span class="pre">active</span></code> 且 <code class="docutils literal notranslate"><span class="pre">clean</span></code> 。个中详情见<a class="reference external" href="../monitoring-osd-pg">监控 OSD 和归置组</a>。</p>
</div>
<div class="section" id="rados-monitoring-using-admin-socket">
<span id="id15"></span><h2>使用管理套接字<a class="headerlink" href="#rados-monitoring-using-admin-socket" title="Permalink to this headline">¶</a></h2>
<p>Ceph 管理套接字允许你通过套接字接口查询守护进程，它们默认存在于
<code class="docutils literal notranslate"><span class="pre">/var/run/ceph</span></code> 下。要通过管理套接字访问某个守护进程，先登录它所在的主机、再执行下列命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">daemon</span> <span class="p">{</span><span class="n">daemon</span><span class="o">-</span><span class="n">name</span><span class="p">}</span>
<span class="n">ceph</span> <span class="n">daemon</span> <span class="p">{</span><span class="n">path</span><span class="o">-</span><span class="n">to</span><span class="o">-</span><span class="n">socket</span><span class="o">-</span><span class="n">file</span><span class="p">}</span>
</pre></div>
</div>
<p>比如，这是下面这两种用法是等价的：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">daemon</span> <span class="n">osd</span><span class="mf">.0</span> <span class="n">foo</span>
<span class="n">ceph</span> <span class="n">daemon</span> <span class="o">/</span><span class="n">var</span><span class="o">/</span><span class="n">run</span><span class="o">/</span><span class="n">ceph</span><span class="o">/</span><span class="n">ceph</span><span class="o">-</span><span class="n">osd</span><span class="mf">.0</span><span class="o">.</span><span class="n">asok</span> <span class="n">foo</span>
</pre></div>
</div>
<p>用下列命令查看可用的管理套接字命令：</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ceph</span> <span class="n">daemon</span> <span class="p">{</span><span class="n">daemon</span><span class="o">-</span><span class="n">name</span><span class="p">}</span> <span class="n">help</span>
</pre></div>
</div>
<p>管理套接字命令允许你在运行时查看和修改配置，见<a class="reference external" href="../../configuration/ceph-conf#viewing-a-configuration-at-runtime">查看运行时配置</a>。</p>
<p>另外，你可以在运行时直接修改配置选项（也就是说管理套接字会绕过监视器，不要求你直接登录宿主主机，不像
<code class="docutils literal notranslate"><span class="pre">ceph</span> <span class="pre">{daemon-type}</span> <span class="pre">tell</span> <span class="pre">{id}</span> <span class="pre">config</span> <span class="pre">set</span></code> 依赖监视器。</p>
</div>
</div>



           </div>
           
          </div>
          <footer>
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
        <a href="../monitoring-osd-pg/" class="btn btn-neutral float-right" title="监控 OSD 和归置组" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
        <a href="../health-checks/" class="btn btn-neutral float-left" title="健康检查" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>
        &#169; Copyright 2016, Ceph authors and contributors. Licensed under Creative Commons Attribution Share Alike 3.0 (CC-BY-SA-3.0).

    </p>
  </div> 

</footer>
        </div>
      </div>

    </section>

  </div>
  

  <script type="text/javascript">
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>

  
  
    
   

</body>
</html>